100% and 80% solutions

By Olin Shivers, from the preamble of the spec for the SRE regular-expression notation

There’s a problem with tool design in the free software and academic community. The tool designers are usually people who are building tools for some larger goal. For example, let’s take the case of someone who wants to do web hacking in Scheme. His Scheme system doesn’t have a sockets interface, so he sits down and hacks one up for his particular Scheme implementation. Now, socket API’s are not what this programmer is interested in; he wants to get on with things and hack the exciting stuff – his real interest is Web services. So he does a quick 80% job, which is adequate to get him up and running, and then he’s on to his orignal goal.

Unfortunately, his quickly-built socket interface isn’t general. It just covers the bits this particular hacker needed for his applications. So the next guy that comes along and needs a socket interface can’t use this one. Not only does it lack coverage, but the deep structure wasn’t thought out well enough to allow for quality extension. So he does his own 80% implementation. Five hackers later, five different, incompatible, ungeneral implementations had been built. No one can use each others code.

The alternate way systems like this end up going over a cliff is that the initial 80% system gets patched over and over again by subsequent hackers, and what results is 80% bandaids and 20% structured code. When systems evolve organically, it’s unsuprising and unavoidable that what one ends up with is a horrible design – consider the DOS -> Win95 path.

As an alternative to five hackers doing five 80% solutions of the same problem, we would be better off if each programmer picked a different task, and really thought it through – a 100% solution. Then each time a programmer solved a problem, no one else would have to redo the effort. Of course, it’s true that 100% solutions are significantly harder to design and build than 80% solutions. But they have one tremendous labor-savings advantage: you don’t have to constantly reinvent the wheel. The up-front investment buys you forward progress; you aren’t trapped endlessly reinventing the same awkward wheel.

Examples: I’ve done this three times. The first time was when I needed an emacs mode in graduate school for interacting with Scheme processes. I looked around, and I found a snarled up mess of many, many 80% solutions, some for Lisp, some for Scheme, some for shells, some for gdb, and so forth. These modes had all started off at some point as the original emacs shell.el mode, then had been hacked up, eventually drifting into divergence. The keybindings had no commonality. Some modes recovered old commands with a “yank” type form, on c-c y. Some modes recovered old commands with m-p and m-n. It was hugely confusing and not very functional.

The right thing to do was to carefully implement one, common base mode for process interaction, and to carefully put in hooks for customising this base mode into language-specific modes – lisp, shell, Scheme, etc. So that’s what I did. I carefully went over the keybindings and functionality of all the process modes I could find – even going back to old Lisp Machine bindings for Zwei – and then I designed and implemented a base mode called comint. Now, all process modes are implemented on top of comint, and no one, ever, has to re-implement this code. Users only have to learn one set of bindings for the common functions. Features put into the common code are available for free to all the derived modes. Extensions are done, not by doing a completely new design, but in terms of the original system – it may not be perfect, but it’s good enough to allow people to move on and do other things.

The second time was the design of the Scheme Unix API found in scsh. Most Schemes have a couple of functions for changing directory, some minimal socket hacking, and perhaps forking off a shell command with the system() C function. But no one has done a complete job, and the functions are never compatible. It was a classic 80%-solution disaster. So I sat down to do a careful, 100% job – I wanted to cover everything in section 2 of the Unix man pages, in a manner that was harmonious with the deep structures of the Scheme language. As a design task, it was a tremendous amount of work, taking several years, and multiple revisions. But now it’s done. Scsh’s socket code, for instance, completely implements the socket API. My hope in doing all this was that other people could profit from my investment. If you are building your own Scheme system, you don’t have to put in the time. You can just steal the design. Or the code.

The regexp notation in this document represents a third attempt at this kind of design. Looking back, I’m amazed at how much time I poured into the design, not to mention the complete reference implementation. I sold myself on doing a serious job with the philosophy of the 100% design – the point is to save other people the trouble. If the design is good enough, then instead of having to do your own, you can steal mine and use the time saved… to do your own 100% design of something else, and fill in another gap.

I am not saying that these three designs of mine represent the last word on the issues – “100%” is really a bit of a misnomer, since no design is ever truly 100%. I would prefer to think of them as sufficiently good that they at least present low-water marks – future systems, I’d hope, can at least build upon these designs, hopefully in terms of these designs. You don’t ever have to do worse – you can just steal the design. If you don’t have a significantly better idea, I’d encourage you to adopt the design for the benefits of compatibility. If you do have an improvement, email me about it, so we can fold it in to the core design and everyone can win – and we can also make your improvement part of the standard, so that people can use your good idea and still be portable.

But here’s what I’d really like: instead of tweaking regexps, you go do your own 100% design or two. Because I’d like to use them. If everyone does just one, then that’s all anyone has to do.

-Olin